Basic Quantitative Characteristics of the Modern Greek Language Using the Hellenic National Corpus
نویسندگان
چکیده
ModernGreek is oneof the least quantitatively studiedmodernEuropean languages and the goal of this paper is to fill this relative void. We use the Hellenic National Corpus (HNC), which is a growing corpus that currently includes 33 million words. The corpus and all the tools used in our work were developed by the Institute for Language and Speech Processing (ILSP). In this paper we focus on threemain areas: the lists of the 1000most commonwords and lemmas, word length and letter frequency.We alsomake some comparisonswith earlier work, in which we had used the previous 13 million word edition of the HNC.
منابع مشابه
Quantitative parameters in corpus design: Estimating the optimum text size in Modern Greek language
The aim of this paper is to investigate the major quantitative parameters related to the definition of the optimum text size in Modern Greek corpus development. Using the Hellenic National Corpus (HNC) (Hatzigeorgiu et al., 2000) as a reference point we estimated a number of critical statistical measures regarding feature counting in different text sizes. The results indicate that frequent ling...
متن کاملDesign and Implementation of the Online ILSP Greek Corpus
This paper presents the Hellenic National (HNC), which is the corpus of Modern Greek developed by the Institute for Language and Speech Processing (ILSP). The presentation describes all stages of the creation of the corpus: collection of the material, tagging and tokenizing, construction of the database and the online implementation which aims at rendering the corpus accessible over Internet to...
متن کاملTribalism & Racism among the Ancient Greeks A Weberian Perspective
Were the ancients Greeks “racists” in the modern sense of the term “racist”? The terms ancient Greek “proto-racism”, tribalism (and/or racism) are used here to denote the abstract, narcissistic notion that not only the non-Greek barbarians, but also certain ancient Greek tribes (like the Macedonians, the Boeoteans etc.) should be excluded from the Hellenic community, for they were considered to...
متن کاملMetalanguage or bidialectism? acquisition of clitic placement by Hellenic Greeks, Greek Cypriots and binationals in the diglossic context of Cyprus
Acquisition of object clitics is one of the more investigated aspects of the largely understudied variety of Modern Greek spoken in the Republic of Cyprus. Previous studies on the acquisition of clitics in Cypriot Greek usually acknowledge that the linguistic reality in Cyprus involves a state of diglossia, where the sociolinguistically ‘high’ Standard Modern Greek co-exists with the ‘low’ Cypr...
متن کاملA Finite-State Approach to the Computational Morphology of Early Modern Greek
We present a finite-state approach to the computational morphology of early Modern Greek that improves the efficiency of searching and accessing to the “Politimo” corpus, which consists of Greek documents printed during the 17th and 18th centuries. Computational morphologies provide users the ability to search documents using only a word root and locate all the corresponding inflected words. Ke...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Journal of Quantitative Linguistics
دوره 12 شماره
صفحات -
تاریخ انتشار 2005